Skip synchronized(unsafeTags) on owner-thread tag writes#11082
Skip synchronized(unsafeTags) on owner-thread tag writes#11082
Conversation
Spans are almost always written by a single thread, so the lock on every setTag/setMetric call is uncontended overhead. This adds a volatile tagWriteState check: if the current thread is the span's creating thread (STATE_OWNER), tag writes skip the lock entirely. Non-owner threads and post-finish writes take the lock and sticky-transition to STATE_SHARED. Long-running spans disable the optimization at construction since the writer thread may read tags on unfinished spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.061 s) : 0, 1060746
Total [baseline] (11.04 s) : 0, 11039555
Agent [candidate] (1.062 s) : 0, 1061936
Total [candidate] (11.065 s) : 0, 11064679
section appsec
Agent [baseline] (1.25 s) : 0, 1249580
Total [baseline] (11.11 s) : 0, 11110236
Agent [candidate] (1.253 s) : 0, 1252668
Total [candidate] (11.081 s) : 0, 11080803
section iast
Agent [baseline] (1.231 s) : 0, 1231355
Total [baseline] (11.391 s) : 0, 11390969
Agent [candidate] (1.222 s) : 0, 1222292
Total [candidate] (11.348 s) : 0, 11348410
section profiling
Agent [baseline] (1.192 s) : 0, 1192401
Total [baseline] (11.012 s) : 0, 11012053
Agent [candidate] (1.197 s) : 0, 1197363
Total [candidate] (11.047 s) : 0, 11046759
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.233 ms) : 0, 1233
crashtracking [candidate] (1.224 ms) : 0, 1224
BytebuddyAgent [baseline] (633.695 ms) : 0, 633695
BytebuddyAgent [candidate] (635.536 ms) : 0, 635536
AgentMeter [baseline] (29.459 ms) : 0, 29459
AgentMeter [candidate] (29.604 ms) : 0, 29604
GlobalTracer [baseline] (249.927 ms) : 0, 249927
GlobalTracer [candidate] (250.035 ms) : 0, 250035
AppSec [baseline] (32.688 ms) : 0, 32688
AppSec [candidate] (32.09 ms) : 0, 32090
Debugger [baseline] (60.457 ms) : 0, 60457
Debugger [candidate] (60.617 ms) : 0, 60617
Remote Config [baseline] (596.675 µs) : 0, 597
Remote Config [candidate] (600.657 µs) : 0, 601
Telemetry [baseline] (8.134 ms) : 0, 8134
Telemetry [candidate] (8.239 ms) : 0, 8239
Flare Poller [baseline] (8.329 ms) : 0, 8329
Flare Poller [candidate] (7.717 ms) : 0, 7717
section appsec
crashtracking [baseline] (1.233 ms) : 0, 1233
crashtracking [candidate] (1.217 ms) : 0, 1217
BytebuddyAgent [baseline] (662.827 ms) : 0, 662827
BytebuddyAgent [candidate] (664.215 ms) : 0, 664215
AgentMeter [baseline] (12.071 ms) : 0, 12071
AgentMeter [candidate] (12.139 ms) : 0, 12139
GlobalTracer [baseline] (248.859 ms) : 0, 248859
GlobalTracer [candidate] (250.152 ms) : 0, 250152
IAST [baseline] (24.55 ms) : 0, 24550
IAST [candidate] (24.685 ms) : 0, 24685
AppSec [baseline] (185.074 ms) : 0, 185074
AppSec [candidate] (185.043 ms) : 0, 185043
Debugger [baseline] (66.02 ms) : 0, 66020
Debugger [candidate] (65.976 ms) : 0, 65976
Remote Config [baseline] (617.702 µs) : 0, 618
Remote Config [candidate] (616.528 µs) : 0, 617
Telemetry [baseline] (8.341 ms) : 0, 8341
Telemetry [candidate] (8.588 ms) : 0, 8588
Flare Poller [baseline] (3.525 ms) : 0, 3525
Flare Poller [candidate] (3.578 ms) : 0, 3578
section iast
crashtracking [baseline] (1.231 ms) : 0, 1231
crashtracking [candidate] (1.208 ms) : 0, 1208
BytebuddyAgent [baseline] (804.843 ms) : 0, 804843
BytebuddyAgent [candidate] (799.163 ms) : 0, 799163
AgentMeter [baseline] (11.485 ms) : 0, 11485
AgentMeter [candidate] (11.312 ms) : 0, 11312
GlobalTracer [baseline] (240.569 ms) : 0, 240569
GlobalTracer [candidate] (239.09 ms) : 0, 239090
IAST [baseline] (26.858 ms) : 0, 26858
IAST [candidate] (25.849 ms) : 0, 25849
AppSec [baseline] (29.086 ms) : 0, 29086
AppSec [candidate] (31.154 ms) : 0, 31154
Debugger [baseline] (63.114 ms) : 0, 63114
Debugger [candidate] (60.912 ms) : 0, 60912
Remote Config [baseline] (2.294 ms) : 0, 2294
Remote Config [candidate] (522.524 µs) : 0, 523
Telemetry [baseline] (12.117 ms) : 0, 12117
Telemetry [candidate] (13.059 ms) : 0, 13059
Flare Poller [baseline] (3.538 ms) : 0, 3538
Flare Poller [candidate] (3.488 ms) : 0, 3488
section profiling
crashtracking [baseline] (1.192 ms) : 0, 1192
crashtracking [candidate] (1.184 ms) : 0, 1184
BytebuddyAgent [baseline] (696.464 ms) : 0, 696464
BytebuddyAgent [candidate] (697.789 ms) : 0, 697789
AgentMeter [baseline] (9.158 ms) : 0, 9158
AgentMeter [candidate] (9.22 ms) : 0, 9220
GlobalTracer [baseline] (208.532 ms) : 0, 208532
GlobalTracer [candidate] (209.882 ms) : 0, 209882
AppSec [baseline] (33.092 ms) : 0, 33092
AppSec [candidate] (33.14 ms) : 0, 33140
Debugger [baseline] (66.066 ms) : 0, 66066
Debugger [candidate] (66.404 ms) : 0, 66404
Remote Config [baseline] (586.264 µs) : 0, 586
Remote Config [candidate] (575.457 µs) : 0, 575
Telemetry [baseline] (7.847 ms) : 0, 7847
Telemetry [candidate] (8.029 ms) : 0, 8029
Flare Poller [baseline] (3.567 ms) : 0, 3567
Flare Poller [candidate] (3.623 ms) : 0, 3623
ProfilingAgent [baseline] (94.073 ms) : 0, 94073
ProfilingAgent [candidate] (95.771 ms) : 0, 95771
Profiling [baseline] (94.676 ms) : 0, 94676
Profiling [candidate] (96.348 ms) : 0, 96348
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1064881
Total [baseline] (8.864 s) : 0, 8864395
Agent [candidate] (1.057 s) : 0, 1056624
Total [candidate] (8.828 s) : 0, 8828009
section iast
Agent [baseline] (1.223 s) : 0, 1222942
Total [baseline] (9.54 s) : 0, 9540121
Agent [candidate] (1.228 s) : 0, 1227861
Total [candidate] (9.573 s) : 0, 9573282
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.235 ms) : 0, 1235
crashtracking [candidate] (1.228 ms) : 0, 1228
BytebuddyAgent [baseline] (638.11 ms) : 0, 638110
BytebuddyAgent [candidate] (633.611 ms) : 0, 633611
AgentMeter [baseline] (29.605 ms) : 0, 29605
AgentMeter [candidate] (29.419 ms) : 0, 29419
GlobalTracer [baseline] (249.999 ms) : 0, 249999
GlobalTracer [candidate] (248.799 ms) : 0, 248799
AppSec [baseline] (32.671 ms) : 0, 32671
AppSec [candidate] (32.0 ms) : 0, 32000
Debugger [baseline] (59.781 ms) : 0, 59781
Debugger [candidate] (59.162 ms) : 0, 59162
Remote Config [baseline] (602.581 µs) : 0, 603
Remote Config [candidate] (625.305 µs) : 0, 625
Telemetry [baseline] (8.153 ms) : 0, 8153
Telemetry [candidate] (8.092 ms) : 0, 8092
Flare Poller [baseline] (8.359 ms) : 0, 8359
Flare Poller [candidate] (7.487 ms) : 0, 7487
section iast
crashtracking [baseline] (1.227 ms) : 0, 1227
crashtracking [candidate] (1.237 ms) : 0, 1237
BytebuddyAgent [baseline] (800.33 ms) : 0, 800330
BytebuddyAgent [candidate] (805.087 ms) : 0, 805087
AgentMeter [baseline] (11.374 ms) : 0, 11374
AgentMeter [candidate] (11.608 ms) : 0, 11608
GlobalTracer [baseline] (239.024 ms) : 0, 239024
GlobalTracer [candidate] (239.509 ms) : 0, 239509
AppSec [baseline] (32.129 ms) : 0, 32129
AppSec [candidate] (30.134 ms) : 0, 30134
Debugger [baseline] (61.718 ms) : 0, 61718
Debugger [candidate] (63.053 ms) : 0, 63053
Remote Config [baseline] (533.982 µs) : 0, 534
Remote Config [candidate] (536.316 µs) : 0, 536
Telemetry [baseline] (11.167 ms) : 0, 11167
Telemetry [candidate] (11.08 ms) : 0, 11080
Flare Poller [baseline] (3.441 ms) : 0, 3441
Flare Poller [candidate] (3.452 ms) : 0, 3452
IAST [baseline] (25.846 ms) : 0, 25846
IAST [candidate] (25.927 ms) : 0, 25927
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 1 performance regressions! Performance is the same for 19 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section baseline
no_agent (17.443 ms) : 17268, 17617
. : milestone, 17443,
appsec (18.813 ms) : 18623, 19002
. : milestone, 18813,
code_origins (18.095 ms) : 17916, 18274
. : milestone, 18095,
iast (18.155 ms) : 17974, 18336
. : milestone, 18155,
profiling (18.425 ms) : 18245, 18605
. : milestone, 18425,
tracing (17.805 ms) : 17631, 17980
. : milestone, 17805,
section candidate
no_agent (18.409 ms) : 18224, 18593
. : milestone, 18409,
appsec (18.918 ms) : 18727, 19109
. : milestone, 18918,
code_origins (18.051 ms) : 17877, 18226
. : milestone, 18051,
iast (18.108 ms) : 17928, 18287
. : milestone, 18108,
profiling (18.507 ms) : 18323, 18691
. : milestone, 18507,
tracing (17.905 ms) : 17728, 18082
. : milestone, 17905,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section baseline
no_agent (1.226 ms) : 1215, 1238
. : milestone, 1226,
iast (3.396 ms) : 3350, 3441
. : milestone, 3396,
iast_FULL (6.086 ms) : 6024, 6148
. : milestone, 6086,
iast_GLOBAL (3.732 ms) : 3669, 3795
. : milestone, 3732,
profiling (2.243 ms) : 2222, 2265
. : milestone, 2243,
tracing (1.97 ms) : 1953, 1986
. : milestone, 1970,
section candidate
no_agent (1.228 ms) : 1216, 1239
. : milestone, 1228,
iast (3.361 ms) : 3312, 3409
. : milestone, 3361,
iast_FULL (5.978 ms) : 5917, 6039
. : milestone, 5978,
iast_GLOBAL (3.658 ms) : 3598, 3718
. : milestone, 3658,
profiling (2.346 ms) : 2325, 2367
. : milestone, 2346,
tracing (1.847 ms) : 1832, 1862
. : milestone, 1847,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section baseline
no_agent (15.265 s) : 15265000, 15265000
. : milestone, 15265000,
appsec (14.687 s) : 14687000, 14687000
. : milestone, 14687000,
iast (18.422 s) : 18422000, 18422000
. : milestone, 18422000,
iast_GLOBAL (18.194 s) : 18194000, 18194000
. : milestone, 18194000,
profiling (15.462 s) : 15462000, 15462000
. : milestone, 15462000,
tracing (14.752 s) : 14752000, 14752000
. : milestone, 14752000,
section candidate
no_agent (15.584 s) : 15584000, 15584000
. : milestone, 15584000,
appsec (14.739 s) : 14739000, 14739000
. : milestone, 14739000,
iast (18.429 s) : 18429000, 18429000
. : milestone, 18429000,
iast_GLOBAL (18.006 s) : 18006000, 18006000
. : milestone, 18006000,
profiling (15.03 s) : 15030000, 15030000
. : milestone, 15030000,
tracing (14.867 s) : 14867000, 14867000
. : milestone, 14867000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~8d5c038a1d, baseline=1.62.0-SNAPSHOT~f064e18a6c
dateFormat X
axisFormat %s
section baseline
no_agent (1.488 ms) : 1476, 1499
. : milestone, 1488,
appsec (2.557 ms) : 2502, 2612
. : milestone, 2557,
iast (2.278 ms) : 2210, 2347
. : milestone, 2278,
iast_GLOBAL (2.32 ms) : 2251, 2389
. : milestone, 2320,
profiling (2.106 ms) : 2052, 2161
. : milestone, 2106,
tracing (2.094 ms) : 2041, 2148
. : milestone, 2094,
section candidate
no_agent (1.493 ms) : 1481, 1504
. : milestone, 1493,
appsec (3.806 ms) : 3584, 4027
. : milestone, 3806,
iast (2.283 ms) : 2214, 2352
. : milestone, 2283,
iast_GLOBAL (2.327 ms) : 2257, 2397
. : milestone, 2327,
profiling (2.105 ms) : 2050, 2159
. : milestone, 2105,
tracing (2.083 ms) : 2029, 2136
. : milestone, 2083,
|
Add three targeted concurrency tests that exercise the exact cross-thread tag write pattern the JMH crossThread benchmark was measuring: - crossThreadSustainedNoCrash: 8 threads × 10k setTag on same span - ownerToSharedTransition: owner writes first, then 8 threads join - manySpansCrossThread: 10k short-lived spans tagged from 8 threads All pass, proving the production code handles cross-thread writes without NPE or structural corruption. Fix the crossThread benchmark: change SharedSpan @setup from Level.Invocation to Level.Iteration. With Level.Invocation, 8 threads raced to call setup() concurrently, causing NPE when state.span was transiently null between invocations — a benchmark harness bug, not a production code bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/test/java/datadog/trace/core/DDSpanContextConcurrencyTest.java
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Looking at throughput numbers, there's a modest gain that might make this change worth the complexity if we can simplify it a bit.
The two main changes that I think we should try are...
- replacing the two volatiles with a single volatile for the owningThread
- introducing high-order helper functions that hide the locking complexity
Then all the duplicate code can go away, and the accessing code becomes something like...
accessTags(unsafeTags -> {
...
});
We just need to make sure not to introduce a capturing lambda, so we don't incur unneeded allocation.
Alas, creating a higher-order helper function in DDSpanContext proved more annoying than anticipated. I'm back to thinking if we're going to do this change, we should do it in OptimizedTagMap. At least in OptimizedTagMap, most methods are sugar around a few methods that just work with a single TagMap.Entry. |
Per review feedback, move the lock-skipping optimization from DDSpanContext into OptimizedTagMap itself. This keeps the optimization invisible to callers — DDSpanContext no longer needs synchronized blocks around tag operations, and developers adding new tag operations don't need to think about locking. OptimizedTagMap now has a volatile Thread ownerThread field. Core methods (getAndSet, getAndRemove, getEntry, putAll, forEach, copy, etc.) check ownership: owner thread skips the lock, non-owner threads synchronize and permanently transition to shared mode. DDSpanContext changes: removed all 27 synchronized(unsafeTags) blocks, added setOwnerThread(current) in constructor, transitionToShared() delegates to TagMap. Also adds @threads(8) JMH benchmark variants and 5 new concurrency tests (mixed read/write, fuzz, value consistency, finish race, concurrent metrics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Right now, this change creates a correctness issue when LegacyTagMap is being used.
As far I can tell, I think this is case where we've reached the limits of AI coding.
Before we consider such a change, I think we first need to phase out LegacyTagMap. That can be done in a separate pull request.
But even then, I think this introduces an error prone / complicated ownership model for only a small gain in real user benchmarks. I don't think that's a good trade-off.
I think we might be able to use this PR as inspiration for a future change, but I think how it fits in needs to be thought through carefully.
Address reviewer feedback on the thread-ownership optimization: - Add TagMap.withLock(Runnable/Supplier) API for compound operations, replacing all synchronized(unsafeTags) in DDSpanContext with unsafeTags.withLock() — locking is now fully encapsulated in TagMap - Add missing ownership checks to OptimizedTagMap: size(), isEmpty(), containsValue(), freeze(), isFrozen(), toString(), immutableCopy() - Add revokeOwnership() helper to avoid redundant volatile writes on the cache line after transition to shared mode - withLock() revokes ownership before running the operation, closing the TOCTOU race where the owner thread could bypass the compound lock - Add self-synchronization to LegacyTagMap (synchronized on all key HashMap methods + withLock override) so it is safe without outer locks - Add TagMapConcurrencyTest: owner fast path, transition visibility, post-transition contention, withLock atomicity, size consistency - Expand DDSpanContextConcurrencyTest: compound atomicity for setSpanSamplingPriority, getTags snapshot consistency, transition + concurrent read safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SpotBugs flags *Impl methods in OptimizedTagMap for non-atomic access to the `size` and `frozen` fields. These methods run either on the owner thread (lock-free by design) or inside synchronized(this) via reentrant calls. Also fix null-safe containsValue using Objects.equals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What Does This Do
Optimizes span tag writes by skipping
synchronizedblocks when the creating (owner) thread writes tags — the ~95% common case. Introduces awithLock()API onTagMapfor compound operations that need multi-operation atomicity.Reworked based on reviewer feedback to address correctness issues and encapsulate all locking inside TagMap:
TagMap.withLock(Runnable/Supplier)replaces allsynchronized(unsafeTags)in DDSpanContext — locking is now fully encapsulated in TagMap, not spread across callersOptimizedTagMapgains ownership checks on all previously-unprotected methods (size(),isEmpty(),containsValue(),freeze(),isFrozen(),toString(),immutableCopy())revokeOwnership()helper avoids redundant volatile writes on the cache line after transitionwithLock()revokes ownership before running the operation, closing the TOCTOU raceLegacyTagMapgets self-synchronization on all key HashMap methodsMotivation
The
synchronized(unsafeTags)blocks in DDSpanContext add ~15-30ns per uncontended monitor enter/exit. With 15-30 tags per span across millions of spans, this overhead is meaningful. Since ~95% of tag writes happen on the creating thread, the owner-thread fast path avoids this cost.Additional Notes
tag: ai generatedwithLock()which always acquires the monitortransitionToShared(), all operations go throughsynchronized(this)— compound atomicity is preservedContributor Checklist
type:and (comp:orinst:) labelsclose,fix, or any linking keywords